“Calhoun 


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations l. Thesis and Dissertation Collection, all items 


1977-03 


Determination of hypothesis testability in 
linear statistical models 


Walls, William Hammond 


Monterey, California. Naval Postgraduate School 


http://hdl.handle.net/10945/18074 


Downloaded from NPS Archive: Calhoun 


| Calhoun is the Naval Postgraduate School's public access digital repository for 
D U DLEY research materials and institutional publications created by the NP3 community. 
3 Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 
KNOX appointed — and published — scholarly author. 


LIBRARY Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


n » 
[ 
, | 
i 
|- e 
D 
i [| 
[ 
NE 
"Y 
R ¿ 
yw ne 
EN 
i}. 
i f E 
E 
D I 
"Em 
4 
ία 
Li 
Li 
[ 
i [i 
D 
"Wu 4 
ΤΩ ΝΤΕ y 
n] 
Ni 
η ΚΡ 
D 
[] > en 
D 
Ja VE 
i E! 
[| L Y 
[ u 
a6 
4 > 
F P ι τὰ. FEN 


rar --- 


as ΠΠ δι 
ΠΧ » i <> v ed 
| 4^ 6 ΑΖ pu. 2 : d $ DE er ο E / od ft 7 
md yk i en a "Fs ii ird ή e 
af P A P 
uri ue At Pt Ee des i 











SUP 
τη 














E | 
pu ον N | *1V ef 4 Y 
' "T 2 = 
L i 1 PW P 
I Lr N v. Te "we d re ar 
ER ani MORS i SET 
i i | 5! " i N A AP AT =A 
A i NU i | E Le MUA Eu | 
"hos = τει | ' τν; η F 18 1 αν ar πι f 
i aW i i Y - LIS , as 
[ r ' a i LAT EC A. px d n ry wen TY d T 
νας, Wes cet 
r [ | = a C | | 
NEN > ' AN A a‘ I | ^t MV TY ir, 
i η EF , R RU 3 TN" E 
[| | B ῃ " f R de L] 77 Bu 
T i yt . L Ec L] n 
I Fr E | 1 N Ti , 
wit El 
ή "d E 
\ f 4 11 = "Lv LEES E 
[ - I I i T Tu Γ LI d n 
i \ 3 | [ | 
η i i \ X i 418 5 
i i [ EN 1 =, = 
i} ir 
| εχει I A Nun 
1 i M [] "SM ng 
\ y E | [ i y [i r 
l - , e 
| σ9 1 d y 
A j L UN i Ae 
TAE: b ^d u y T. ' F, N ew N r hs 
a A A Wa i. 
i Li I i N 1 Ἄγ 
cU ^ [ - I [] a , L ] 
PANA W Im" Y $ 
| \ i τ N ] B AV 
i i =- AT y ‘2 EL 2 
l _ B "EE 
r r i , | A "X a ë i f D D. e 
- n 1 [ 
i η 
i - I, 
u i 1 
Ἴ F [ | UN D ' r [ i LU 
n r - , "E 
A. ἘΠῚ i = Ἱ = 
1 y i | 7 y Ü i 
i " || ET ΜΝ m. 
\ JESAY u > 
4 = , - i 
[ 1 | | [ , I E ‘tel - Tha 
\ N E . ] qut 
" | - n" E m ; i } TY 
-. | ; | A ^ ' m 
f T F T'Y- 7e 
á BOX E Es LV an 
EM UN = 4 i 
i i / if - P» τμ zu 
ΤΙ E z - 
Πα ο ο πα, 
P ‘ 
r a Ms ru, JAEN OS Y ! 
E i = πα MNT. z 
TE i I A | = J LEY »* 
ë [ ZA lo ek Tr | ea "m — v ed 
Ἢν a "MN [ LE 
si [| | 3e. Li Pur, — «κω - 
I al =a pa d PE 
PERCHE NT, 17 
ri i ab Γ ive? δ 
i pA vos, E 
I s ΤῊ TE N ae [] 
4 PE N ID ARO de 
x i í " ῤ "tH Le ἘΠ * 
d , Te o sale Π TA » T 
N i yi EE uy Ny 
A m OD Vs 







euer 
ju Nn m e 2 U a Te VE 

















Ri 


Uv 1 
Kov 


| FP 
[] ee 


μα > 







uu 









ye "X pete i »h 
a AM TOL | 


1 4 t é 
i n dn a 
En 1 US OMM UA 
κ. LIU 1 X^ 
1 E 
ΠΝ κ 
i M T : ` n 
' A i t 
r r Η {4 
AS NUT A es 
, γ᾽ | e Π y : b 
1 ἧς δ, ΄ 
πον ερ 
mb AE ae D 
I J sin 
2 ΓΣ 

















NAVAL POSTGRADUATE SCHOOL 


Monterey, California 


DETERMINATION OF HYPOTHESIS TESTABILITY IN LINEAR 
STATISTICAL MODELS 


BY 


William Hammond Walls 


March 1977 





Approved for public release; distribution unlimited. 


1178050 





UNCLASSIFIED 


SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 


REPORT DOCUMENTATION PAGE 


. REPORT NUMBER 2. GOVT ACCESSION NO 


4. TITLE (and Subtitie) 


DETERMINATION OF HYPOTHESIS TESTABILITY 
IN LINEAR STATISTICAL MODELS 













Co Pano col 6 READ INSTRUCTIONS 
BEFORE COMPLETING FORM 
3. RECIPIENT'S CATALOG NUMBER 








5. TYPE OF REPORT & PERIODO COVERED 


Master's Thesis; March 1977 


6. PERFORMING ORG. REPORT NUMBER 


8. CONTRACT OR GRANT NUMBER(4) 






. AUTHOR(»s) 


William Hammond Walls 











10. PROGRAM ELEMENT, PROJECT, TASK 
AREA à WORK UNIT NUMBERS 


12. REPORT DATE 
March, 1977 
13. NUMBER OF PAGES 
50 


| 15. SECURITY CLASS. (of thie report) 


Unclassified 


3a. OZCLASSIFICATION/ DOWNGRADING 
SCHEDULE 


Approved for public release; distribution unlimited 


9. PERFORMING ORGANIZATION NAMÉ ANO ACORESS 
Naval Postgraduate School 
Monterey, California 93940 














11. CONTROLLING OFFICE NAME AND ADORESS 
Naval Postgraduate School 
Monterey, California 93940 










| MONITORING AGENCY NAME & ACORESS(If ditferent trom Controiling Oflice) 





Naval Postgraduate School 
Monterey, California 93940 








16. DISTRIBUTION STATEMENT (of thie Report) 






17. OISTRIBUTION STATEMENT (οί the sbetract entered in Block 20, if diflerent frem Report) 


18. SUPPLEMENTARY NOTES 










19. KEY WORDS (Continue on rovorea elde ii neceecary end identify by bloek number) 


Analysis of variance, linear model, hypothesis testing, estimability, 
Statistics, testability, generalized inverse, design of experiments 







20. ABSTRACT (Continue on reveree aide ií neceacary and identify by bieek rvambor) 










Analysts conducting experiments must frequently deal with situations in 
which data is incomplete or missing. This creates problems trat can seriously 
affect classical hypotresis testing by introducing extraneous terms into the 
hypothesis in a complicated way. A technique exists that allows an analyst to 
determine precisely which experimental terms are actually present in a pro- 
posed hypothesis and what that hypothesis would actually be testing if 








employed. This paper examines the mathematics underlying the technique and 
DD ‚San 93 1473 EDITION OF | NOV 6818 OBSOLETE UNCLASSIFIED 
(Page 1) 191 SECURITY CLASSIFICATION OF THIS PAGE (Whon Dara Entered) 


1 





UNCLASSIFIED 


en PSPS "VR SR u m m m on nl o R 
$icum TY CLASSIFICATION OF THi5 PAGE/When Dota Enteros. 





applies the theory to a widely used data aralysis computer package. A 
computer program is presented to facilitate implementation of tre method. 


ora. 1473 UNCIASSIFIZD 
SEN 0102-014-6601 SECURITY CLASSIFICATION OF THIS PAGECIhen Data Entered) 


2 





Approved for public release; distribution unlimited 


DETERMINATION 2 ds IN LINEAR 


i William Hammond Walls 
Lieutenant-Commander, United States IP 
B. S., United States Naval Academy, 196 


oeme edi in partial fulfilinent of the 
requirements for the degree oí ' 


ο ος ος SCH ENC Ce Ne OPERALEONS RESEARCH 


from the 
NAVAL POSTGRADUATZ SCHOOL 
March 1977 


A 


N 





ABSTRACT 


Analysts conducting experiments must frequently 
deal with situations in which data is incomplete or 
Missing. This creates problems that can seriously 
affect classical hypothesis testing by introducing 
extraneous terms into the hypotnesis in a complicated 
way. A technique exists that allows an analyst to 
determine precisely which experimental terms are 
actually present in a proposed hypothesis and what 
that nypothesis would actualiy be testing if employed. 
This paper examines the mathematics underlying the 
technique and applies the theory to a widely used data 
analysis computer package. À computer program is 


presented to facilitate implementation of the methcd. 
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T EXECUTIVE SUMMARI 


O PROBLEM 


At times even the most carefully designed and executed 
exper iments can be plagued with aborted tests or missing 
data. Such unbalance in the data can have a significant 
impact upon the mathematical structure "065 analyzic 
techniques used in analysis of variance. Eins addıreıen ο 
increasing the complexity or computations, unbalanced desian 
can also seriously affəct hypothesis testing. Because of 
lack of palance, hypotheses purporting to test the influence 
of a main effect, for example, may be hopelessly confounded 
wen interaction terns. Blindly "testing" such confounded 
hypotheses without an appreciation of the level of pollution 
from extraneous terms can lead to serious error in 
interpreting results. It is desirable to find a general 
procedure for use with analysis of variance that can 
determine exactly what a proposed hypothesis is testing in 


ο π of the main effects and interactions. 


B. APPROACH 


Because of its mathematical power and notational 
Simplicity, the matrix form of the linear model Y = Xb + e 
is used in deriving a solution to the problen. The linear 


model leads to the "normal equations" X'Xb = X'Y. Since X'X 





[ 


0 
Sn general not of full rank, any solution (b) for b is 


-1 
not unique. Further (X'X) does not exist; one must turn to 
the concept of a generalized inverse G of X'X. It can be 


shown that testing a hypothesis ine q'd = m involves 
expressing the hypothesis as a linear function g'Gx'X of the 


generalized inverse (G) and X'X. While determination of 
SIX 2S frequently a non-trivial manual calculation, it 


can be handled easily on a computer. 
ΟΝ SOLUTION 


If an analyst needs to test a particular hypothesis it 
is possible that additionai, undesired rms may be 
polluting the hypothesis to such a degree that his 
interpretation of test results may be completely invalid. By 
computing the value of q'GX'X he will be able to determine 


precisely what his proposed hypothesis is actually testing. 
Meee CONCLUSICNS 


Recognizing that an unbalanced design can lead to 
difficulty in interpreting traditional tests of hypotheses, 


it is concluded that: 


1. it is mathematically possible to determine the exact 


nature of a froposed hypothesis, and 


2. such a determination is feasible using a computer. 





II: UALTAENALICAL JUSTIFICATION 


a ee ee cams E JA ας 


A. GENERALIZED INVERSE 


A generalized inverse of a matrix A is defined to be any 
matrix G satisfying 
AGA = A. 
can be shown that, for a given matrix A, Gis in general 


Mee unique [ Searle, 1971). 


EA SOLUTION OF CONSISTENT LINEAR EQUATIONS 


The system of linear equations AX = Y is consistent if 
any linear relationships existing among the rows of A also 
exist among the corresponding elements or Y. Since linear 
equations have dsssalıe ronzeir and only íf they are 
consistent, the procedures outlined below are conrined to 


Systems of consistent linear equations. 


The following theorems from Searle [1] are stated 
Without proof in order to develop solution procedures for 


consistent equations. 


Theorem 1. Consistent equations AX = Y have a solution 
X = GY if and only if AGA = A. 


Theorem 2. If A has q columns and if G is a generalized 


inverse of A, then the consistent equations AX = Y have the 





Solution 


0 
A Ez 


Bere lS any arbitrary vector of order q. The notation 


0 
indicates that X , which satisries AX = Y, is a solution and 


not the general vector of unknowns X. 


Theoren 3: For the consistent equations AX = Y, all 
solutions are, for any specific G, gererated by 


0 
IE ο + (GA - 1)2, for arbitrary Z. That is, one need 
derive only one generalized inverse of A in order to be able 


to develop all solutions to the system AX = Y. 
fee SPECIAL CASE OF SYAMETRIC MATRICES 


Dune linear model used, inter alia, in analysis of 

variance involves the system of consistent linear equations 
AIDES EX 

mat are solved for b. It is therefore worthwhile to 

consider the special case of the symmetric matrix X'X in 


some detail. The following development is from Searle [1]. 


Lemma 1. X'X - O implies X = 0. 
Proof: This is true because if X'X = 0, the sums of squares 


of the elements of each row equal zero, hence must be zero. 


Lemma 2. PX'X - QX'X implies that PX! - QX'. 
Proof: Apply Lenna 1 to the identity 


(PX'X-QX'X) (P-Q)' » (PX'-QX!) (PX'-QX!)'! - 0. 
Maat τε, (PX"-0X') (PX'"-0X*)'" = 0 implies that  (PX'-QX') = 0 
which implies that PX! = QX!. 
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Theorem 4. When G is a generalized inverse of X'X, then 
i. G' is also a generalized inverse of X'X; 
ii. XGX'X - X (i.e., GX' is a generalized inverse of X); 
noc οτί is invariant to G. 


Proof: 


ου) oy definition 
K'XGX'X = X'X 


transposing iG toa XX establishing (i). 
(ii) US A x 

by Lenma 2 X'XG'X! = X! 

transposing AENA X establishing (11). 


(iii) Suppose F is some generalized inverse different 
Oon G. ποπ ο κ τα = XFX'X. By Lenna 2 AGR - X EXTUS 
That is, XGX' is the same for ail generalized inverses of 


Peek, establishing (iii). 


Dee THE LINEAR MODEL 


The general linear model is Y = Xb+ e where Y is an 
n x 1 vector of observations whose components are random and 
observable; X is an n Xx p matrix of experimental design 
whose components are real and known; bis ap x 1 vector of 
parameters whose components are real and unknown; e is an 
nx 1 vector of experimental error whose components are 
random and unobservable. The vector ¢ is defined as 

ey =e ry) 


E(e) = E(Y) - E(E(Y)) = 0, 
and E(Y) = E(Xb) + E(e) 
Hc DE 


Every element in e is assumed to have the same variance v 


11 





and zero covariance with every other element, thus e is 
2 2 
UM tr»xbuted (0,v I) and Y is distributed (Xb,v I). Deriving 


the normal equations for the linear model yields 


XXD LY 
0 
which can be solved for b using the techniques of 
generalized inverses described earlier, i.e., 


0 
ο = GX'Y 


and E (b^ RS) 
GRUE (Er) 
GX'Xb 
Hb 
GX'X. 


li 


ro 
H 


wnere 
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III. THE CONCEPT OF ESTIMABILITY 


ES TIMABILITY 


moeagezined by Searle [1], a linear function q'b of the 
parameters in b is estimable if it is equal to any linear 
function [t'E(Y)] o£ the expected value of the observations 
E Y. NS ADORA to note that t! isot in general 
unique; the only requirement for estimability is that such a 


DC Oor exist. 


meee FROPERTIES 


The definition of estimability leads πο four 


mathematical properties of immediate importance: 


(1) The expected value of any observation is estimable. 
In this case tf is a vector with a single element equal to 


one; the rest of 1ts elements are Zero. 


(2) Any linear combination of estimable functions is 
estimable. If g'b and r'b are estimable, then q'b = t'E(Y) 


and r'b = s'E(Y). Therefore di + E = eum * 


5Η which is estimable. 


(3) An alternative form of the condition of estimability 


can be developed as follows. If q'b is estimable, then Dy 


13 





H-runition q'b = t'E(Y) hence q'b = t'Xb. This must hold for 
all values of b since the condition of estimability does not 
depend on a specific choice of b. This leads to the resuit 
cee t'X. 


0 
(4) When qb is estimable, α Ὁ is invariant to the 
0 
solution used for b because 


0 0 
ED E'XOX* Y. 
. L . . 0 . 
Since by Theorem 4, XGX' IS An tO DS 
; 0 
invariant to G and therefore to b when q'b is estimable. 


Herein lies the essential importance of estimability: if 


0 
EN 1s estimable, a'b has the same value for all solutions 
q 0 . . 4 - . . . - . 
D. Pats wane estimable runction is a linear runction of 


the parameters that is invariant to whatever solution is 


used for bp. 
ο, THE TEST 


A functicn q'b is estimable if there exists some vector 
Ho such that q*' - t'X. Finding such a vector t' may be a 
formidable task with a design of large dimensions. AS an 
alternative, it is possible to test for estimability by 
determining if q'H = q'. Searle {1] shows that gq'b is 
estimable if and only ir g'H = g', as follows. 


If q'b is estimable 


q: = t'X 
q'H 2 t'XH 
q'H = t'XGX'X 


by Theorem 4 GX' is a generalized inverse of X, 
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hence aii = t'X 


ques g"; 
On the other hand, if 
ΘΠ 
q' = q'Gx'x 
and SSA ESAS BOE E - q'GX', 


Ree LHE CONSTRAINED MODEL 


development 


pueEudossalocequatyons X'X5b = X"Y form a consistent 
system of linear equations where X is of rank rp. Because 
PA 2S, in general, not of full rank, there are many 


Sclucion vectors that will satisfy the system. In order to 
| 0 -— 2 
ENS n'a particular solution b , additional constraints of 


the form Cb - 0 are often added to tne model. A commonly 
used set cf constraints satisfies the restrictions 

* the main effects sum to zero 

* the interaction effects sum to zero across each 


ENUDSCril1pt. 


Adding the constraints Cb = 0, where the (p-r) rows 
of C are linearly independent of the rows of X, yields the 
following system of linear equations: 

n xe sl cx 
l...| = it? s.d 
L9 C i O| 


The constraint matrix C can be used to transform the design 





x 
matrix X into a constrained matrix X by performing basic 


row operations on the system of linear equations. 





& 


A X : le] 
9 9 9 κ 99 9 
odo pe cP fol 
x 
Note that X is the same size (n x p) as X; b remains 
unchanged. The practical effect of introducing the 


constraints into the design matrix is to make some of the 


* 
columns of X consist entirely of zeros. While b remains 


i Ἂς 
unchanged, the transformation of X into X has the effect of 


"deleting" some elements of the parameter vector by the 
mechanism of creating those columns of zeros. Once the 
Senstraints have been integrated into the design matrix, 


* 
transforming X into X , the constraints become redundant and 


can be removed from the nodel by the following technique. 
mene A pe a (n,ntp-r) macrix such that 
À | - 0 if ài * j 
1j 
À 9 9 
1.) 


H 


1 if i= j. 


Then multiplying by A, 





x 
ho [b + als 
afea = Alea. η. 
| 0 c c NE 
ax 
yields the constrained linear model Y = X b + e, which is 


eguivalent to the constrained system above. 


| 2 
Since e 1S assumed normal (0, v I), Y is also 
x *  x% x 


mermals E(Y) =X b. The normal equations, X 'X b = X 'Y, can 
0 

Meme solved for a particular solution b ποτ, will also 

satisfy the original normal equations X'Xb = X'Y. If G is 


κ x 0 κ x 
defined as the generalized inverse of X 'X then b = GX 'Y 
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* x * 
and it follows that G X ' is a generalized inverse of X. 


Ἂς κ ko x 
Let H = G X 'X. 


It is stressed that this constrained linear model 


was developed sclely for the purpose of finding some 


0 
particular solution vector b to the original system of 
* 
normal equations. In the discussion that follows, X is the 
same size as X and the parameter vector b is the same in the 
constrained linear model as it was in the original linear 
model Y = Xb + e. 


2. 2xample 


x 
As a Simple example of the development of X , assume 


that the design matrix X is 
1 0 0 
1 0 1 0 
1 1 0 0 


«ab 


met C be 
0 1 1 l- 

Then | X | is 

uc 

1 0 0 1 
1 0 1 0 
1 1 0 0 
0 1 1 1 

Subtracting the bottom row from the top row yields 
A | 0 
1 0 1 0 
1 1 0 0 
0 Ί 1 1 


w 





mich is | X |. 


The appropriate A matrix is 
1 0 0 0 
0 1 0 
0 0 1 0 


e 


Multiplying yields x 
1 -1 -1 0 
1 0 1 
1 1 0 0 ° 


© 


Pee BTOMELICAL COMPUTER PROGRAMS 


Due University or California publishes and maintains the 
BIOMED series of standard data analysis packages for use on 
digital computers [Dixon, 1976]. One of the programs within 
the package, BMD05V, performs computations for analysis of 
variance with the linear statistical model. The design 
matrix employed is not the same as the design matrix (X) in 
that model however. A user of BMDO5V is required το 


introduce appropriate additional constraints to permit 
0 
Semputing a particular solution (b) for the parameter 


vector. It will be shown that techniques applicable to the 
design natrix X in the general linear model can be applied 
Brrectly to the BMDOSV design matrix. 
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fee APPLICATION TO BMDOSYV 


The constraints, Cb = 0, added to the linear model in 


section D above are the type used to generate the BMDO5V 
* 
design matrix. The resulting matrix X has the same number 


of columns as the original design matrix X, but because some 
of these columns are zero, it is possible to suppress then 
for arithmetic purposes. For computational Simplicity, che 
matrix actually used by the computer program deletes all the 
zero columns and assumes a corresponding "reparameterized" b 
vector of lower dimension. For mathematical rigor, however, 
* 
pue x used in the following sections retains the sane 
number of columns as X. This restriction will be eased when 
Ἂς 
Mien matrix X iS actually applied to the computer programs 


in Appendix Br 
EE STENABILITY IN THE CONSTRAINED MODEL 
It can be shown that estimability in the constrained 


* 
model Y = X b + e follows the same pattern as estimability 


mm the full model Y = Xb + e. 


x 
Theorem 5. qb is estimable if and only if q'H = a’. 
ποςος By definition, q'b is estimable if 

A E IN LE 
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x 


q 
x 
UNAS EX =- Then 
* xe Kx 
Gaim — t'X G X 'X , 
* * 
gii =mil = g', 
* 
anā if η “αμα; 
κ κ x * %* 
1 ες .τ-; Let t'.- q'G X '. 
* 
Then qu ctt ce. 
This result allows computations to be performed 


directly upon the constrained matrix in order to examine the 

estinability of proposed hypotheses. The computer program 

HYTEST (Appendix A) can accept either the constrained design 
E. 

Matrix X (with "zero" columns suppressed) or the standard 

EU n Matrix X as an input. If X is used for input, HYTEST 

x 


offers the option of uSing either the constrained matrix X 


Or the standard matrix X to compute tests of estimability. 


* * 
Note that q'H b is always estimable since q'H b= 


k k 5k * x x κα 
GX X D = q'G X 'E(Y) = t'E(Y) where t'! - q'G X '. 


H.  TESTABILITY 


From Searle [1], ali linear hypotheses can be 


handled by a general procedure; specific hypotheses are then 


20 





considered to be applications of this general procedure. 


The general hypothesis may be written a K'b = m 
where K* is a matrix of s rows and p coiumns. The only 


wm cation on K! is that it have full row rank. That is, the 
hypothesis must be composed of linearly independent 


functions of the parameter vector. 


To review analysis of variance briefly, classical 
technigues rely upon the ratio of two independent Chi-sguare 
Are tributions, each dividad by its respective degrees of 
freedom, to generate an F statistic. The sun of squares 
explained by the modei if the hypothesis is assumed true, 


divided by its degrees of freedom forms the numerator cf 


ct 
εν σε 
(D 


peedtl Stic. For many situations, the denominator is the sum 
of squares ror error divided by its degrees of freedom. 
Each sum or squares can be conveniently represented by 
appropriate quadratic forms which must neet Certain 


requirements in order to be Chi-square distributed. 


Searle's derivation of a test of the general 


hypothesis depends upon K'b being estimable for every row 


mee >. Lf this assumption is satisfied, the quadratic forn 
i 


2 0 -1 0 2 
Q/v = (K'b - m)'(K'GK) (K'b - m)/v 
is distributed non-central Chi-square aná has rank s. The 


sum of squares for error can be shown to be 


-1 -1 
SSE = (Y-XK(K'K) nm)'(I-XGX') (Y-XK(K'K) m). 
Q and SSE are independent so F(H) = Q/s/SSE/(n-r) is 


distributed non-central 


2 





1 2 
F'(s,n-r,(K'b-m)'(K'GK)  (K'b-m)/2v ] 
Mie test statistic is 


0 0 
Pech) a= κ οι απ) CK’ GK) E 5 Sm (I-r) 
which is F distributed with s and n-r degrees of freedom 


under the null hypothesis. 


3. determination of testability 


Suppose Cat K'b is not estimable. Then the 


hypothesis i K"b =m is not testable. Assuming that 


(K'GK) SESE Ss, if one were to compute F(H), what 
hypothesis is actually being tested? When working with the 


constrained linear model Y= xX b+e, the answer is 


Ἂς 
"KIH b = m." The derivation which follows closely parallels 


the procedure used by Searle [1] for the linear model 
fea XD + e. 


* * 
The hypothesis H: K'H b = m is testable since K'H b 


is estimable for each row k '. The appropriate numerator 
al 


miadratic form is 


* 0 x Ok OX - 1 * 0 
o, = (K'H b - m)'(K'H G H 'K) (ΚΗ Ὁ - m) 
κ 0 Er Ex 
But K'H b - K'GX 'XG X 'Y and since 
κ.κ A 
A at follows 
* 0 * κ * * ox 
O ΕΕ κι X SEX G IX Iy., 
* ΕΙ: 
Let ls = G X 'X G ! (which is a generalized inverse of 
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MEX ). Then 


Ἂς * x 
K'H b = S. ΠΥ 
κ 0 0 
hoe Db ee KID 
1 
Ü x x «ο * 
where De = ay τ το πο "solution το X 'X b = X 'Y 
x 
obtained from using > : 
x k κ κ k k x X ox 


Also K'HG H 'K 


O AO τα GER 


7 mS EN 0 mE Ὁ 
en ee LT CV TR 
ar x 
KIH G-H ΚΞ .- K 
Therefore 2, reduces to 
9 Kt] | ' (K! "κ E: Kb i 
= D m TOTE =R) 
1 : 1 1 í 1 
Which 15 the quadratic form that would result from 
attempting to test the non-testable hypothesis K'b = m using 
0 κ x 
the solution D - Ga X'y. These calculations are 


indistinguishable from those that would be performed in 


x 
testing the testable hypothesis K'H b = n. 
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A CONCLUSICNS 


ieee Given linear function of the parameter vector (q'b) 


Memestimable if and only if g'H = q'. 


2. Since only estimable functions are testable, if 


q'H # g' then the hypothesis actually bein tested is not 
y: g 


Gb = m but rather Eo q'Hb = m. 


3. The mathematics developed for proving both of the 
preceding conclusions can also be applied to a constrained 
design matrix, such as that used in the BMDO5V program, to 


allow determination of estimability directly. That is, 


a the function q'b is estimable if and only if 


* 
EBENT  — q', and 


e if q'bis not estimable then the hypothesis 


E. 
Bo q'b = m is actually testing a q'i b = m. 


no SIGNIFICANCE 


The results presented above afford a mathematical 


Jüstification of the need for a computer program to 
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determine the testability of proposed hypotheses. A sample 
program, HYTEST, is presented in Appendix D. Using the 
HYTEST prcgram permits an analyst to determine not only 
whether each of his hypotheses is testable, but also 
precisely what main effects and interactions confound each 
hypothesis that is not testable. Such information can be 
used to make more informed decisions, a priori, on the 
design of experiments, and a posteriori on the analysis and 
interpretation of experimental results. Knowing the nature 
and degree of confounding may not necessarily ease decision 
making. It can help to ensure that once a hypothesis is 
accepted or rejected the analyst is aware of the degree of 
Purity" of his conclusions concerning the effects of 
various factors in the experiment. If used, these techniques 
Can prevent an analyst from complacently assuming that he is 
testing one hypothesis when he is, in fact, testing 


something quite different. 
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APPENDIY A 


ANTES 


TEST is a FORTRAN IV program that facilitates 
determining if a proposed hypothesis is testable. The User's 
Berge contains operating instructions and a list of options. 


FLOWCHART 


EN EUT Επ DATA 


DETERMINE TYPE OF »| READ BICMED 
INPUT MATRIX "| MATRIX 


READ STANDARD 
DESIGN MATRIX 


INPUT EACH AUXILIARY 
EQUATION IN TURN AND 
DONDpASTCENOQNIORSRATEONS 


OUTPUT BMDOSV DESIGN CARDS 
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COUPUTE GENERALIZED 


INVERSE G 





CHECK AGCURACY OF 
GUN PeiaN Toes 
LONE ARGI 










DET ER EN eae NO 
ANY RIPOTHESES 





ARET TOTBE TESTED 


165 


Πα ο Πτα Πο ς 


ENSURE O'GO IS 


NON-SINGULAR 


ΠΤ ΟΠ 


DEMON SOS NO CANNOT TTESI 











HIPOTHESIS 





He Or Rae Sees © 
TESTASLE 





2. 





SUBROUTINE 


In the interest of computational accuracy and speed, 
HYTEST uses subroutine LPSDOR from the International 
Mathematical and Statistical Library (IMSL) to compute a 
generalized inverse G. This is a proprietary subroutine. 
Under the licensing agreement, its code may not be 
distributed to or used by a user outside the Naval 
Postgraduate School. It may not be used on a non-NPS 


computer system. 





APE oD Ix 7B 


USERS GUIDE 


This guide contains complete operating instructions for 


Eng the ccmputer program HYTEST. 


do 


the constrained design matrix X actually used in the 

xe 

computer prcgramns BMDOSV and HYTEST is tne same X 
introduced in Chapter III except that all zero columns nave 
been deleted to facilitate computation. This has the effect 
of reducing the parameter vector b; an analyst using these 
programs 1S cautioned to ensure that he is aware of which 
elements have been "deleted" from the parameter vector. See 


Meter III, sections D, E and F, for further information. 


O VER VIEN OF COMPUTER PROGRAM HYTEST 


The program HYTEST  expioits techniques  enumerated 
earlier in this paper to determine if selected hypotheses of 
the form En k 'b = m are testable within the framework of 

2 
a specified linear model. If an hypothesis is not testable, 
HYTEST computes an algebraic form of the parameter vector 
that would actually be tested if the propcsed hypothesis 
were to be employed. 
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Given the standard design matrix (X) from the linear 


Medel, and appropriate additional constraints, HYTEST can 
* 


compute the design matrix (X ) for use in the  BMDOSV 
program. If desired, DESIGN cards for use in the BIOMED 
package can be produced as an auxiliary output. If the 


BED crtion is exercised, HYTEST performs all of its 


calculations upon X making it essential for the user to be 


aware of the exact structure not only of the original design 


x 
matrix X, but also of the constrained matrix xX . The user 
can Select the option to have all calculations performed on 


Eft preferred. 


EMT NPUT OPTIONS 


The program can accept either the standard design matrix 


from the basic iinear model, or a BMDO5V design matrix. 


OUTPUT CPTIONS 


1. hypotheses 


The major option for output concerns the testability 
of user defined hypotheses. For any number of hypotheses 
from zero to 99, HYTEST will determine if each hypothesis is 
testable, and if not testable, the program computes what 
would actually pe tested if the specified hypothesis were to 
be employed. This option is automaticllly suppresssed if 


there are no hypotheses to be tested. 
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2. design card: 


If a BIOMED design matrix is computed, the user can 
select an option that will punch appropriate DESIGN cards 
for BMDOSV; a printed replica of the cards is also produced. 
If punched cards are not required, the user can opt for the 
printed form of the DESIGN cards without having then 
punched. This output option is only available if a standard 


design matrix X is used for input. 


EZ ZREESULACY 


An estimate of the accuracy with which  HYTEST is 
computing the generalized inverse matrix G can be obtained 
upon request. The output consists of tne matrix resulting 
Dr cum etraeting τε 0X Xox x If the computer were 
BHOrrectiy accurate, all entries would be zero. Because of 
arithmetic inaccuracies in computing G, the entries are 
frequently not zero. Since the matrix H=GX'X is used in 
assessing the testability of hypotheses, X'X -X'XGX'X 
affords an estimate of the accuracy to be expected when 


determining the nature of the hypothesis being tested. 


I- 

e 
im 
al 
IG 


Xx 


The H matrix used in determining the testability of 
selected hypotheses will be printed if requested. Selecting 
this option will generate printed output for all of the 


previous options as well. 
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ο «ποπ σα Ἵππχετσε G 


The final option prints out the generalized inverse 
mes’ Xk used in computations within  HYTEST. This option 
automatically includes printed output for all previous 


options as well. 


Pee INPUT REQUIREMENTS 


Input cards must pe in tne following order: 


PROBLM card (required) 
Design matrix cards (required) 
AUXEQN card (required if standard 
design matrix input option is used) 
Auxiliary equations (optional) 
Hypotheses (optional) 
FINISH card (required after last problen) 


The HYTEST PROBLM card, based upon a similar card 
used in the BMDO5V program, is used to set up various 


program parameters and options. 


DATA COLUMNS RESIRIETTONS 
EROBLM 1-6 
User's optional problem number 7-8 
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Number of design card sets SN Ἱ ND < 150 
Number of columns in design matrix IS 1 < NC < 60 
Blank 14-15 
Number of hypotheses 1627 i= NHS 99 
Blank 15-25 
Output options 21 
Blank poco 
Input options 72 O= X matrix 
* . 
1= X matrix 
Output options are selected from the following 
table. | 
EMOGNCOLUMN 27 CONTAINS PAS PRENTED OUTPUT WILL INCLUDE 
0 A. An assessment of what each 
hypothesis is testing 
1 B: Desig caris. (1f che BIOMED 
output option is being used) 
plus option 0 
C ACCUBaCyROL coefficients ana 
cption 0 
2 D. The H matrix (H=GX' X}; 
plus ail output from options 
Ó and 1 
3 E. The generalized inverse G 
plus all previous options. 
3. design matrix cards (required) 


a. standard design matrix 


One design 
the design matrix. 
the 


(right justified) 


luct ate identical to row 1. 


reserved for the columns of the design matrix. 


The first 2 colunns of 


The next 60 columns 


ES 


Card is required for each unique row of 


cald 23" contain 


number of rows in the design matrix 


ο ο) are 


Enter a zero 





Eone in the appropriate card column. να! ο!πη 
corresponds to a column in the design matrix (not to exceed 
a totai of 60 columns). 
example 
If a hypothetical design matrix were: 
¡O 


0 
1 
1 
1 


ο. ου ον ul 
O © O J 
e O O O O O 


O) 
Only tnree design cards are required: 
021100 απ lunas 10) 
031010 
011001 


DORE NDOST design matrix 


enter the BIOMED design matrix (Without data cards) 


exactly as it 1s used in the BMDOSV package. 


πιο ε«͵Ἢτπ (required af the standard design matrix 


DATA COLUMNS Root) CL TONS 
AUXEQN 1-6 
Number of auxiliary equations ¿E 0 < NA < 99 
Punched card output option 9-11 1=Design cards 
Punched 


O=Design cards 
Not puncned 
ution: DO net use this card if the BIOMED input option is 


exercised. 
If there are no auxiliary equations, the program 


Will perform its computations on the standard design matrix. 


None of the EIOMED output options are then available. 
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Auxiliary equations are algebraic equivalents of 
BMDOSV constraints. For the example to be used in this 
section assume the linear nodel 


Ye ni e 1=1,2,3 
2 1 a 


Each auxiliary equation (m) requires a separate 
EXE ον pue cards. The rırst card contains the number 
of parameters in equation m. As an example, the auxiliary 
equation 


a +a +a = Q 
1 2 3 


contains three parameters so the first Card for this 


Pte tion would contain 03 in the first two columns. 


The second card contains the column number and the 
coefficient of the parameter whose column is to be deleted 
From the BMDOSV matrix. For instance, the parameter vector 
in this example is 


b! = (u, a , a a 
( y 1 2 3! 


Parameter E corresponds to the fourth column of the design 
matrix X. In order to eliminate > from the BMDOS5V matrix, 
the second card must contain: 


0401 


mathe first four columns. 


A separate card is required for each parameter in 
the auxiliary equation, so this example needs twc more 
cards: 

0201 
0301 


d» 





eor responding to the columns and coefficients for z and > 


respectively. 


Caution: if a parameter's column is to be deleted 
from the BMDOSV design matrix, that parameter must not be 
used in an auxiliary equation; its algebraic equivalent must 
be used instead. A complete set of cards must be included 
for each separate auxiliary equation (m cards for each 


auxiliary equation with m parameters). 


The above input will produce the following BMDOSV 
design cards: 
DESIGNOO 200 1001000 
DESIGNOO 3001000001 
DESIGNOO10010-10-1 


eeu ypOchesis cards (optional) 


The format for hypothesis cards is Similar to that 
for the auxiliary equation cards. Assume that the hypothesis 


emin terest 15: 


There are two parameters in the hypothesis tar and 
2) correspcnding to columns 2 and 3 in the constrained 


BMDOSV matrix. It is important to note that the parameters 
used in hypothesis testing must be associated with the 
Matrix used to compute H = Gx'X. If the standard design 
Matrix X is used throughout the program, the  coluun 
corresponding to a specific parameter is unchanged from the 
Original model formulation. If, on the other hand, the 


standard design matrix X is used for input, and auxiliary 
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* 
equations are used to reduce X to the BMDOSV matrix X , then 


the columns used in the hypothesis cards must be the 
x 


appropriate columns from X. Ric firs: card, Tor this 
example hypothesis, contains 02 (for two parameters) in the 
first two columns. The second and third cards contain 

021.0 

032120 
respectively. The ESE two columns identify the 
parameter's column in the appropriate design matrix. The 
parameter's coefficient, a decimal point and, LE 
appropriate, a minus sign, must be punched in colunns 3 
Enrough 11. 


, TINISH card (Required after last problem) 


O A A WEEDS A D Re ----» "€ — πρ CM oe ae E — a εκ -- — 22 «καπ = A A A “-. 


If several problems are to be run in sequence, the 
cards for each problem are to be grouped in sequential 
blocks. Once HYTEST finishes a problem it determines if 
another problem is to be run. If so, it executes that 
problem. It will continue executing problems in sequence 
until all problems are completed and a FINISE card is 
encountered. The FINISH card must be the last card in the 
data deck and the word FINISH must be punched in the first 


Six columns. 
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APPENDIX C 


EXAMPLE 


A. THE MODEL 


miemuse OF HYTEST can best be illustrated through use of 


an example. The one chosen is from Searle [1]. 


The model is: 


Y: Ια O G MS 
Su m j 1j d ge 
n ..:ῃ 
Kkz145 578 
M is the kth observation of the ith treatment of the jth 


ijk 


type; M is the mean effect; A is the efrect of the ith 
D 


j 
interaction between the ith treatment and the jth type and 


treatment; B is the effect of the jth type; AB, is the 
2] 


E. is the error term. The number of observations noted 
ij | 


for each cell is shown in the following table: 


151 152 j=3 j=4 


Ι 
O N υ 
N MO O 
o 
A O N 


H- 
u 
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the X matrix 15: 


IO Al O 


κ... 5 4 5. 


1 


E and 


AD 


AB 
23 


το. 


AB 
Pic pue agesigqnematrix for HYTEST is 


Zero eolumns for 


the 


Suppressing 


31 


AB 


031100100010000000 
011100001001000000 
021100000100100000 


021010100000010000 
021010010000001000 
021001010000000100 
021001001000000010 
041001000100000001 
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BES CARD PREPARATION 


ise ot Ne BNDUSY matrix is controlled by the type 
of constraints used in the BIOMED package. In general, the 
number ort parameters for each main effect and each 
interaction effect is reduced by 1 in each dimension. In 
this example, 1 parameter is deleted for each of the main 
effects, A and B. Since i = 1,2,3 and j = 1,2,3,4 there are 
3 x 4 = 12 interaction terms. Reducing each dimension by 1 
ο το i= 1,2 and j = 1,2,3 for ZA UD remaining 
interaction terms. As noted in the preparation of the input 
desicn matrix above, the columns for four interaction terms 
are zero.  Those« four terns are therefore deleted from tne 
uodol. Two other interaction and two main effect terms can 
be deleted by addition of the usual BMDO5V constraints. For 


this examplie, the. following constraints were adopted: 


(1) A +A +A =0 
1 2 3 

(2) BE +B +B +B =0 
1 2 3 4 

(3) AB | * AB. * AB + AB =0 
11 12 13 14 

(4) AB + AB. + AB + AB 0 
31 32 33 34 


In auxiliary "equation (1) it is desired to delete 


parameter > which corresponds to column 4 of the design 
Matrix. Since there are three parameters in this equation 


mae first card is: 
03 

The other cards are: 
0401 
0 20 1 
0301 


Mas completes the cards for auxiliary equation (1). 


4 0 





τν επιπειοηπ (2) 3s quite similar. If η is chosen 


for deletion the cards are: 
04 
0801 
0701 
0601 
0501 


Auxiliary equation (3) contains the parameter AB which 


does not appear in the design matrix (it is a column of 
O's). Under the BMDO5V constraints, it has an algebraic 


equivalent, AB = -AB - AB , Which must be used in its 
12 A 32 


place. Equation (3) then becomes 


(3a) AB - A - A + + AB = 0. 


B B À B B 
11 22 32 13 14 


The input cards used to delete ip from the design matrix 
4 


are: 


05 

1101 
13-1 
14-1 
100 1 
0911 


The interaction term s from equation (4), does not 
appear in the design matrix. Substituting E = - AB - 
aoe into equation (4) yields: 


(4a) AB = eA + + AB = 0. 


B AB B 
11 21 32 33 34 


The input cards used to delete an from the design matrix 


+ A 


ae Cs 
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05 

1601 

DL 

12-1 

1401 

1501 
Deleting a total of eight parameters from the standard 
design matrix wiil yield a Bi4DOSV matrix of twelve non-zero 
columns. 
fee 3 4 5 6 7 8 9 10 11 12 


A A EB B DB, AB ÀB AB ÀB AB AD 
io 2 1 2 3 da 13 21 22 n 33 


BEZ HYPOTHESES 


ου. επτιτῃ that à - A + is estimable but 


AB AD 
2 11 Zu 


nat de = s is not, the folloving cards are used: 


04 
021.0 
O 
0710 
OS a 
02 
021.0 
03=1.0 


mcn completing rhe T PROBLM card, the AUXEON card and 
the FINISH card as outlined in the User's Guide and putting 


miewGdras In the correct order, one is ready to run EYIEST. 
The outputs of major interest are the DESIGN cards and 


the hypotheses. In the interest of brevity, only those two 
Outputs will be illustrated. 
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DE SAMPLE OUTPUT 


Design cards suitable for BMDO5V will be similar to 


sample shcwn below. 


DESIGN 
DESIGN 
DESIGN 
DESIGN 
DESIGN 
MES IGN 
ESSIGN 
DESIGN 


Th 


the hypothesis H: 


22. Ν δν RO Ν NA a UU 


e column on the left below represents the vector q! 


INN Ὁ 
ime. Ὁ 
IES 
1 0O 1 
S] 
Idi η 


πι e] 
η αὶ 


1 ο υἳ ο ο ὁ- ο 
ο ο ο ο 0--0---0 
οὐ πι ο ο) ο. OQ 1 1 
OECD) 0 Ὁ 
Ve Oe yO 

πο ο el ο Oo A 
A E00 9 
AA Ου O = | 


g'b = m. The 


ο. = O O O O oc © 


represents the vector q'h. If q' - q'H, the 


testab 


le. 


η! 


Mae HYPOTHESIS 
See ENTEREST 15 


(HO: Q 


IB = 
0.0 
i 

0 
0.0 
νυ 
0.0 
0.0 
0.0 


4) 


THE HYPOTHESIS WHICH IS 
ACTUALLY BEING TESTED 
(HO: Q'HB = M) 
0.115 
0.748 
55ο 
-0.084 
-0.298 
0.038 
ο 221 
0.099 
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* g'H, the hypothesis is not testable. 


hypothesis 


the 


Ln 


column on tne right below 


1s 





0.0 ΞΟ: 


0.0 ο που 
0.0 ES 
0.0 -0.061 


ADAN THhz LIMITS OF COMPUTATIONAL ACCURACY, 
BESEOTHESIS 1 CANNOT BE TESTED AS STATED ABOVE 


THE HYPOTHESIS THE HYPOTHESIS WHICH IS 
OF INTEREST IS ACTUALLY BEING TESTED 
(HO: Q'B = M) (HO: Q'HB = HM) 
0.0 0.0 
ο ο 
-1.0 O 
0.0 0.0 
0.0 0.0 
0.0 DO 
1.0 1.0 
0.0 0.0 
-1.0 -1.0 
0.0 0.0 
0.0 0.0 
0.0 0.0 
0.0 0.0 


ΙΙΙ της LIMITS OF COMPUTATIONAL ACCURACY, 
mee -OTHESIS 2 CAN BE TESTED AS STATED ABOVE 


As expected (Searle {1]), one hypothesis is estimable; 
the other is not. Additional output information is 
ERU dabpie from HYTEST if desired. The User's Guide 


enumerates all of the options. 
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APPENDIX D 
CCMPUTER PRCGRAM 
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